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CHAPTER I 


IM?R0DUC3TI0N 


The Fourier Transform and the Fourier series are important concepts 

in the field of signal analysis and spectral characterisation. Discrete 

Fourier Tra,nsfoim (DFT) is a transform of its own right such as Fourier 

inte^al transform or the Fourier series transform. It defines a spectram 

of time series and so DFT of a time series’ of equally spaced samples is 

closely related to Fourier Transform and so DPT is mainly usefxfl for power 

spectrum analysis and filter simulation on digital computers. If the time 

■ 2 

series consists of N samples, then atout R real multiplications and 
additions are required to canput^ the coefficients. 


Past Fourier Transform (PFT) is a highly efficient algorithm 

1 

developed hy Cooley and Tukey for ccanputing the DPT of a time series . * 

It takes the advantage of the fact that the calculation of the coefficients 
of the DPT can he carried out iteratively -sdridi results rin a considerable 
savings of ccmputation time.* For, the ssane nranber of samples R, it requires 
only Hlog^N 'arithmetic 3FT» ■„ 


j;:-' 
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Discrete Walsh Transform (DWT), like DFT, can also be used to 
define the spectrum of time series, and requires considerably less compu- i 
tation, since the Walsh functions which constitute a complete set of 
orthonormal functions like sine and cosine functions, can only take the 
values 1 and -1 , thus eliminating multiplication totally. In Walsh trans- 
forms, different signals are characterised by their sequency spectrum 
rather than by the frequency spectrum. Especially in the case of sampled 
data systems the descriptions of sampled signals in the sequency domain is 
advantageous since the sequency spectrum of such signals is finite. 

2 

Past Walsh Transform (FWT) is also an algorithm making the compu- 
tation of the Walsh Transform very fast. It has been the subject of several 
studies in the past few years even though the basic theory was established 
as early as 1958 G-ood"' . The PWT has an inherent computational advantage 
over the FPT. The PWT requires only real addition and subtraction 
operations while the FPT requires complex multiplications . This difference 
results in a significant hardware simplicity advantage and a possible speed 
advantage for the PWT. 

The sequency spectium is dependent on the phase relation between 
the signal and the sampling signal. With the audio signals, the phase of 
the signal is relatively unimportant for carr 3 d.ng infoimation, but the 
phase completely changes the sequency spectrum. This should be compared 
with a frequency spectrum of audio signal, where the majority of infor- 
mation is obtained in the amplitude of the spectrum and very little in the 
phase of the components. This makes PFT a more useful tool than PWT for 
audio applications. 
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Video signals on the other hand have two characteristics that make 
them compatible with ¥alsh Transform. The transient characteristics (and 
hence the phase) of video signal is important and they can be represented 
exactly by a Walsh Transform. Also a video signal is divided into time 
slots by the line scanning process, and this is a natural subdivision for 
a Walsh Transformaticn . So the Walsh Transform is a specific example of 
generalised harmonic analysis approach to image classification whereas 
Fourier Transform is useful for audio signal processing. Moreover PWT 
can lead to a powerful, yet relatively simple Digital Walsh Filters which 
can be used for image processing where image enhancement and data com- 
pression can be achieved by suitable filters. Two dimensional Walsh 
Transform has already been used for picture transmission in Japan and it 
will not be far off from now to see 3-dimensional picture transmission 
using 3-dimensional Walsh Transform. 

Kanade"^ developed an Arithmetic Unit for FFT using two’s complement 
serial arithmetic, where the hardware basically consists of four multipliers, 
one adder and one subtractor along with associated registers and control 
circuits. He has also presented an 'Increment Logic’ for the weight 
generation in FFT where the ROM required for storing the weights in the 
conventional method has been replaced by a smaller ROM for just storing 
the increments. This thesis is an extension of the above work, the objective 
being (i) to incorporate the generation of the trigonometric weights in the 
FFT AH itself Tising c;cunter- decoder synthesis, (ii) to build a separate 
AH for PWT and (iii) to combine the control of both FFT and PlfT in one 
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control unit. The weight generation scheme and ATJ for FFT are described 
in the second chapter, followed by a description of the FWT AU, its hard- 
ware and operation in the third chapter. The fourth chapter deals with 
the principles and the hardware realisation of the main control unit to 
control the address and data flow in both FPT and F^iH . 



CHAPTER II 


AHITmtSTIC TJHIT FOR PFT 

PPT is a clever computational technique o±' sequentially comhining 
weighted sums of data samples so as to produce the DPT coefficients. The 
mathematical hackground of PFT and the arithmetic unit, as well as the 
scheme for the generation of the trigonometric weights are discussed in 
this chapter. 


2 *1 Mathematical background of PPT 

DPT of a time series of equally spaced samples is closely related to 
the Pourier Transform of the corresponding continuous signal. The r^*^ 
coefficient of DPT is given by 
N-1 


^r = ^ \ 

k=0 ^ 


.th 




r = 0,1, . . . . N-l , 


( 2 . 1 ) 


where is the k sample of time series which cousists of N samples, and 

( 2 . 2 ) 

is the trig'onometric weight* 


■SsT = exp( - ) ... 
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As described previously, Ffl is nothing but the Cooley - Tukey 

algorithm which is based on the property of factoring the record of 

samples, taking the DFT of the factors and then combining the results. 

For an N-point FFT, where N=2^, n being an integer, there are n iterations, 

each iteration consisting of ^ basic computations, resulting in a total 
N 

of *2 loggN computation cycles. Each such cycle consists of a typical 
pattern called the * butterfly pattern^ which is nothing but the generation 
of a two-point transform according to the transform equations 

and “ h ” ^*^2' 

where and sire complex numbers fetched from the sample locations in the 
memory and D* and the new values generated by the transformation, which 
are to be loaded back in the same sample locations. This basic cycle 
requires one complex mult iplicatio n, one complex addition and one complex 
subtraction. The actual hardware realisation for this arithmetic operation 
is schanatically shown in Fig. 2.1 where = a+jb, = c+jd and 

= e+jf . For an N-point transform ^ complex weights are required. The 
sequence of operation is illustrated by the signal flow graph for an 8-point 
FFT in Fig. 2.2. From the diagram it is obvious that each butterfly 
requires a pair of addresses and these addresses follow a repetitive 
logical pattern and the sequence of the addresses is different in each 


iteration. 
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Fig. 2.1 PPT AU CONTHOL FLOW DIAGRAM 
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2,2 ’Increment Logic* 

It can be generalised from the signal flow graph illustrated in 

Fig. 2.2 that all the weights are used only in the last iteration and the 

pattern in which they occur remains the same for all earlier iterations i 

Hence as far as the sequence of occurrence of weights is concerned, it is 

sufficient to analyse only the pattern in the last iteration. For an 

N 

N-point transform, — complex weights are required, and the pattern of the 
occurrence of these weights in the last iteration is given in Table 1 , 
which also shows the increments Ar in the DPT frequency leading to the 
next required weight . 

N 

The increment A. r = ^ means that the ratio of the new weight to the 
previous weight is given by 

exp( - ) = -d 

and hence the new weight can be generated by just inteixhanging the real 
and imaginary parts of the previous weight and changing the sign of the 
imaginary part . No new value of increment is thus necessary for this, 
let us denote the successive distinct values of increment A r, leaving 
out r = — by k^, where i=1 ,2 . . n-2. Then the general expression for 
k. can be obtained from the inspection of Table 1 as 


k. 


1 



JN 

r.i+2 


, i = 1 ,2 . . . n-2 


(2.5) 
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Computation 

number 


£,r Distinct valu.e of £i.r 


1 0 


2 

3 

4 

5 

6 

7 

8 
9 


K 

4 

n 

8 

8 

I 

16 

16 

16 

51 

16 

N 

32 


I 

4 

n 

8 

N 

4 

16 

I 

4 

K 

8 

I 

4 

1^ 

32 


H n 13^ 

4 ’ " 8 ’ ■ 16 ’ ■ 32 ’ ‘ ' 


TABLE 2 

Pattern of Occurrence of Distinct Increments 


Number of 
Points 

Number of 
Iterations 

Distinct 

Wei^ts 

Distinct 

Increments 

Sequence of Distinct 
Incremeats 

8 

3 

2 

1 

>'1 


16 

4 - 

4 

2 

>=1 ^2 

^1 

32 

5 

8 

3 

tr\ 

M 

Ovl 

^■^1 ^2 ^1 

• 

• 

• 

• 

• 


• 

• 

* 

• 

♦ 


N = 2^^ 

n 

N 

A 

n-2 

. . .k 

. . It V 
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The pattern in which these distinct increments are required in the 
last iteration is shovm in Table 2 for various values of N. The same 
pattern follows for any other iteration with appropriate truncation of the 
sequence. 

Eqn. (2.3) shows that for the first increment in each iteration, 

k^O lies in the fourth quadrant, while for other increments it lies in the 

third quadrant . Hence the imaginary part of exp(-«jk^6) is always positive 

and its real part is always negative except for the first increment. 

n 

Moreover, only n-2 distinct increments are required for a 2 -point FFT . 

If the nninher of points is doubled, then only one more distinct value of ths 

W 

increment is required, as compared to ^ distinct weights required in the 
conventional method. For each computation cycle, the new weight can be 
generated by the AIT itself, which is capable of doing complex multiplication, 
complex addition, complex subtraction, only if the appropriate value of 
expC-jk^G) is available. The additional algorithm for FPT using this 
'Increment Logic' is given in Fig. 2.4 in the form of flow chart, where 

exp( -jk^e) = g +oh 

This additional computation has to be perfomed before starting 
the usual butterfly computation shown in Fig. 2.5 for generating the 
required new weight . However, 'this additional step can be skipped for 
those additional computation cycles where the new weight is either the 
same as that in the preceeding cycle or can be generated from the latter 
by interchanging the real and imaginary parts. 




Fig. 2.4 ADDITIONAL AD CONTROL FLOW DIAGRAM FOR NEW WEIGHT GENERATION 



13 


2.3 Computation Goinnter 

A binary computation counter is used to count the number of compu- 
tations performed in each iteration, and this counter overflows at the end 
of each iteration. For a c'^-point FFT, 2^”"* computations constitute each 
iteration and n-1 bits are therefore required for this counter. For a 
2K-point transform, an 1J>-bit counter is required. This counter is just 
a ripple counter with the output of each flip-flop connected as the clock 
input of the next. The overflow of a particular bit is detected using an 
AOI gate corresponding to the required number of points of FFT/Fv7T. 

2 .4 Choice of Word Length 

The number of bits required to represent the word should be chosen 

for the necessary resolution, and this number should be ccmpatible with the 

number of bits required to represent the weight . The different values of 

distinct weights should differ at least by 1 bit in any discrete m-bit 

representation for the latter to be meaningful for a given value of N. 

N 

From Table 1 , it can be seen that k.6 tends to as i increases. As a 
’ 1 2 

result, the difference between the arguments of successive distinct factors 
exp( -jk^©) used in the generation of new weights tends towards zero as 
i increases. It is therefore necessary to ensure that the bit represen- 
tation is at least capable of distinguishing the last two increments. 
Moreover, if a minimal overall accuracy has to be guaranteed, the distinct 
increments should have even greater resolution. 
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No’-t, from eq,n. 2.3, 


k. 


n-3 


k 


n-2 


= 3 


(2.4) 


Hence the ratio between the last two distinct factors used in the generation 
of weights is given by 


F = 


exp(-jk ,6) 

n-3 ( 

exp(-ak^_2e) 


exp(336) 


For large values of 0-^0 and hence 


F = 1+j3® = 1 + 




N 


(2.5) 


( 2 . 6 ) 


Let the required resolution demand that the smallest change in k^^ 

K 

should produce an effect at least in the last X hits in the computed value 
of a distinct new weight. Clearly, then, for an m-bit representation to he 
compatible v/ith a 2K~point PFT, P must differ from unity at least in the 
last S bits of the imaginary part. This requires the following condition 
to be satisfi ed. 

as the magnitude has to be represented by m-2 bits, allowing 1-bit for sign 
and 1-bit for overflow, or 

m ^ log2fi'+ X + 2 - ^n+x-2 ... (2.8) 

Table 3 gives the minimum requirement for 1 percent resolution. 
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2 #5 Interchange Goiimiand Generation for a 2K-point FFT 

It has been seen in section 2.2 that in some cases the new weight 

can be generated from the previous weight by giving the 'Interchange 
Command' . Prom an extension of Table 1, the pattern in which this command 
occurs is easily obtained as shown in Table 4 in terms of iteration number 

and the computation number. 'The latter may be represented by the states 

COG C C 

Qq , • (where represents the LSB, the next significant 

bit and so on) of the flip-flops constituting the computation counter. 

Each state of the computation counter corresponding to a particular combi- 

C G 

nation of values of Q,q , Q,^ . . . . is henceforth referred to as *CC setting 

2 .6 Increment Command Generation for a 2K-point FFT 

This command is generated whenever a new weight has to be generated 
from the previous weight using the 'Increment Logic* . There are n-2 
distinct increments and the pattern in which they occur for a 2K-point PPT 
in terms of I and CC setting is given in Table 5* 

'X' indicates don't care condition, with the qualification that no 
command should come when the CC setting is 0000000000. 

The patterns shown in Table 4 and 5 can be easily extended to arrive 
at the following generalised rule about the occurrence of the interchange 
and increment commands in terms of the state transitions of the computation 


counter. 
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TABLE 3 


~gord Length for 1 percent Resolution 



N 

n 

m 

32 

5 

10 

512 

9 

14 

2048 

11 

16 


TABLE 4 


Pattern of Occurrenoe of Interchange Command 


I 




GO 

Settin, 

O' 

o 





No . of 
Command: 






< 

<^3 

Qg 


^0 

1 



No 

Gonmand 

Necessary 





0 

2 

1 

0 

0 

0 

0 

0 

0 

0 

0 

0 

1 

5 

X 

1 

0 

0 

0 

0 

0 

0 

0 

0 

2 

4 

X 

X 

1 

0 

0 

0 

0 

0 

0 

0 

4 

5 

X 

X 

X 

1 

0 

0 

0 

0 

0 

0 

8 

6 

X 

X 

X 

X 

1 

0 

0 

0 

0 

0 

16 

7 

X 

X 

X 

X 

X 

1 

0 

0 

0 

0 

52 

8 

X 

X 

X 

X 

X 

I 

1- 

0 

0 

0 

64 

9 

X 

X 

X 

X 

X 

X 

X 

1 

0 

0 

128 

10 

X 

X 

X 

X 

X 

X 

X 

X 

1 

0 

256 

11 

X 

X 

X 

X 

X 

X 

X 

X 

X 

1 

512 


'X* indicates 'don't care' condition 
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TABLE 5 ■ 

Pattern of Occurrenoe of Increment Command 


I 





GC Setting 




No . of 
Gomraands 




^6 

5 

< 



< 

o o 

1 


NO COMM/iKD NSCESSAHY 





0 

2 


NO COmiAWD NECESSARY 





0 

3 

X 

0 

0 

0 

0 

0 

0 

0 

0 

0 

1 

4 

X 

X 

0 

0 

0 

0 

0 

0 

0 

0 

3 

5 

X 

X 

X 

0 

0 

0 

0 

0 

0 

0 

7 

6 

X 

X 

X 

X 

0 

0 

0 

0 

0 

0 

15 

7 

X 

X 

X 

X 

X 

0 

0 

0 

0 

0 

31 

8 

X 

X 

X 

X 

X 

X 

0 

0 

0 

0 

63 

9 

X 

X 

X 

X 

X 

X 

X 

0 

0 

0 

127 

10 

X 

X 

X 

X 

X 

X 

X 

X 

0 

0 

255 

11 

X 

X 

X 

X 

X 

X 

X 

X 

X 

0 

511 
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Q 

INTERCHANGE AFTER H/ERY 0 to 1 TRANSITION OF (l=2,5..n) 

Q 

IRGREIVIENT AFTER EVERY 0-' to 1 TRANSITION OF 

(I = 5,4, ... n, i = 1,2 .. n-2) 


where i 1-2 


It is obvious that the interchange command can be looked upon as 
the special case of increment commands for i=0. 

From the foregoing rules for the generation of different commands, 
it is clear that an interchange command is to be generated for every 0 to 1 
transition of Q,^ ^ second iteration and so on. The exact logic for 

the generation of interchange conunand can be obtained in terms of the 
settings of the iteration register iwhich is used to keep track of the 
iteration currently in progress and the computation counter as follows. 

Let A^ , A^, ... "^^he n-line output of the iteration register 

(i.e. A^ = 1 only in the I'^^iteration and 0 for other iterations) and let 

C be the interchange command variable given by the following Boolean 
INT 

expression 


^INT 


= ^ 2 ' V 2 ^ b 


V: 


4 - 




A 

n 0 


(2.9) 


It is then obvious that an interchange command is to be generated 
for every 0 to 1 transition of • 

Similarly, the following expression can be obtained for the 
increment command variable whose 0 to 1 transitions will call for 


the increments 
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C = A 
INC ^5 


V3 " "4 


C 4 + 


A (P 

\ % 


(2.10) 


2 .7 Precision and Speed Considerations 

The total ntmiber of increment commands in a 2'^-point PFT can be 
obtained iising Table 5 as 

= 2'^“'' -n . . . (2.11) 


and the total nxmiber of butterfly computations involved is given by 


^UT = 


n.2 


n-1 


(2.12) 


Each butterfly computation and each increment involves a complex 
multiplication and each multiplication has got an error which is due to the 
tranoation in multiplication. This error can be considered as a ’noise’ 
propagated throughout all the iterations. The magnitude of this error may 
increase or decrease when it is in propagation. The ratio of the total 
number of increment commands to the total number of computations is given by 


M. 


INC 

“but 


2 -n 


n.2 


.n-1 


r 

n 


( 2 ^ 15 ) 


1 

lor large values of n. For a 2K-point PFT, this ratio is “ which is 
rather small, and so additional error involved in the system due to the 
increment logic can be considered to be small. 

Though the new weights are generated by, the Ariihmetic Unit itself, 
the total time of this FFT for all iterations can be made the same as the 
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conventional method of storing the weights in the ROM, where the weights 
are readily available whai the computation is started. In the 'Increment 
Logic' also weights can be made readily available when the computation is 
started, provided the AU is sufficiently fast, utilising the time taken 
at the end of each computation to write the data back in the memory and 
to read the new data. Since the memory cycle time is not too small, we 
can trigger the AIJ to generate the nev.' weights as soon as the computations 
are over. Before the new data are read, the AU can finish the generation 
of new weight and is ready for the next computation. No overlapping is 
possible between these as data and weights are in separate registers. 

2 .8 Block Diagram 

The block diagram for FFT AU is shown in Fig. 2.5. In this block 
diagram, the adders, suhractors, multipli eirs , the associated registers and 
the control circuits have already been made by Kanade^. The additional 
hardware rec[uired for the modified version of the weight generation, AU 
clock generation, data display etc. has been presented in the following 
sections . 

2.9 Count er-Pecoder Scheme for Interchange/increments 

Interchange command can he generated by having a simple decoder 
decoding for I and CC setting in accordance with the pattern shown in 
Table 4. The decoder for the increment command should have (n-2) output 




Fig. 2.5 BLOCSC DIAGRAM FOR FFT 
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lines corresponding to (n-2) distinct increments. Prom these lines, it 
is possible to generate the numerical values of cos(-k^0) and -sin(k^0) 
using an ’Increment Generator’ . Even though this is possible in principle, 
the size of the decoder becomes too large as it is a multioutput, multi- 
input decoder, with (n-2) line output and n lines from the iteration 
register and (n-l) lines from the computation counter as inputs. 

Alternatively, we can use a sequential decoder in which an increment 
counter is used to count all the increment commands corresponding to various 
distinct increments and from the appropriate transition of this counter we 
can generate (n~2) different lines. This increment counter is just a 
ripple counter in which 0 to 1 transition of is used as the clock. 

It is obvious from Table 1 that the i distinct increment is to be generated 
whenever ^ makes a 0 to 1 transition. This scherae is shown schemati- 
cally as .a. block diagram in Pig. 2.6. 

In this scheme, the decoders for ^INC single 

output decoders which are made as per the Tables 4 and 5 respectively and 
they are shown in Pig. A 2.1. Since only one of ’n’ output variables of the 
iteration register is '1' in a particular iteration, the decoders can be 
realized in very simple AND-OR structures. 

The increment counter is shown in Pig. A 2.2 where the top row of 
the flip-flops constitute a ripple counter which just counts the number 
of increment commands and the bottom row of the flip-flops register the 
0 to 1 transition of the corresponding flip-flop in the top row. When 
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distinct increment is to be generated then only the i'^ flip-flop in the 
bottom row is '1' and all the remaining other flip-flops in that vovi are 
zero. All the flip-flops in the bottom row are cleared by the increment 
command itself which goes to zero at the end of the computation. The top 
rows of the flip-flops are cleardd at the end of each iteration, 

2.10 Increment Values for a 2K-point FFT 

The number of distinct increments to be generated is 9(=n-2) for 
a 2K-point FFT. The values of cos(-k^6) and -sin(k^6') corresponding to 
these 9 distinct increments k^ , k^ • • • k^ have been tabulated in Table 6, 
using a l6-bit sign-magnitude binary coding with the sign bit appearing as 
the most significant bit. It is evident from Table 6 that most of the bits 
in the real and imaginary parts are 1’s and 0’ s respectively. A simple 
decoder therefore can be easily designed to generate the appropriate values 
of cos(-k^0) and -sin(k^0) which appear as two simultaneous outputs of the 
decoder having a 9-line input representing the value of i . The circixit 
diagrams for the above decoders for real and imaginary parts are shown in 
Figs. A 2.3 and A 2.4 respectively, where the shift registers are used for 
parallel to serial buffers since the serial arithmetic has been used in 


FIT AU. 
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Table 6 


IS^'bit Representation for the Values of the 
Real and Imaginary Parts of e3Cp(-,1kj ^Q) 


i 


cos(k£©) 


-sin(k£0^) 

1 

2 

Z lir 

8 

0 1 0110101000001 

1 01100001111101 

0 

0 

10110101000001 

11101100100001 

3 

1 l^iT 
” 16 

1 11010100110110 

0 

10001 110001110 

4 

^ 292^ 
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governed by a control algorithm which has 12 steps, each step consisting of 
l6 pulses, as the word length, has been chosen as 16 bits for 2K-point FFT. 

The control algorithm consists of 3 basic sub-cycles of 4 steps each. 

The first sub-cycle is used for generating the nevf weight from the previous 
weight using 'Increment Logic* . The complex multiplication of the second data 
with the weight is done in the second sub-cycle and the basic butterfly 
operation is perfonaed in the third sub-cycle. 

It may be sometimes necessary during the testing of the eq,uipment to 
start the operation at any sub-cycle and perform the computations corresponding 
to one full sub-c3role or just one step in a sub-cycle. This leads to a 
necessity of generating a burst of 64 pulses or 16 pulses respectively when 
a manual trigger for A.U. is given. In the normal 'RUIF' mode this trigger 
for the burst is governed by the main control unit itself. 

For faster AU operation, it may have to start from the first step 
itself when increment command is present or from the fourth step if the 
interchange command is present or from the fifth step if none of the commands 
is present. To provide all these facilities, s3nichronous presettable 
counters are used where the presetting is done by the above commands in such 
a way that the burst starts from the desired step. 

A gated oscillator using Schmitt Trigger NAHL gates is made as the 
clock generator and the first clock pulse of this is used to preset the 
counter. For the actual computation, this clock puj-se should be disabled 
and only the second and the onward pulses are used. This provision is made 
by a 7475 J-K flip-flop with suitable gates as shown in Fig. A 2 . 5 • 
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To goncratc the burst of 16 or 64 or the whole set of pulses, a 
7495 shift register is used where the presetting is done in the required 
way. The shift register overflows at the end of required number of clock 
pulses and this overflow resets a flip-flop which stops the clock genera- 
tion. The burst is started by setting this flip-flop by a manual trigger 
or the AU trigger from the control unit. Provisions for external clock 
required to test the AU step by step are also made by using Nakd gates 
which select either the internal or external clock. The timing diagram 
for the above circuit is shown in Pig. A 2.6, 

2.12 Buffer for Multipliers 

These are four serial-parallel multipliers in the PFT AU designed 
by Kanade using the 'add and shift* algorithm for multiplication. These 
multipliers use only the magnitude parts of the multiplier and the multi- 
plicand for multiplication and so the l6th bit of the multiplier and 
multiplicand should be made zero before going to the multiplier. Since 
each multiplier bit controls I 6 bits of the multiplicand, buffers should 
be added to increase the fan out capacity. In each multiplier output, 
the LSB comes in the output line and the remaining bits of the product 
come in a separate line. Some form of gading is therefore essential to 
combine all the bits in the proper order in a single line. D flip-flops 
with suitable gates and buffers are used for this purpose and the circuit 
realisation for the above one is shown in Pig. A 2.7. 
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2.15 Display 

For testing the FFT AU, some form of display of data is essential 
when manual triggering is used to test each stage of the AU. Since each 
data consists of real and imaginary parts of l6 bits each, two sets of 
l6 LED& are used for display. Since serial arithmetic is used, a serial 
to parallel buffer is also provided to make the data available in 
parallel for display . Since two sets of data, one trigonometric weight 
and one increment are involved, AOI gates are used to select one of the 
above four sets. The circuit diagram for the above one is shown in 
Fig . A 2 .8 . 



CH/iPTER III 


FAST WALSH TRJiISFOH'.'I GERER/.TOR 


Walsh Transform is quite useful in the applications of image processing 
and pattern recognition. This transformation is "binary in natixre, making it 
suits'ble for implementation in a special purpose computer. A second 
advantage of FWD over FFT arises out of the fact that the algorithm 
reqxiires only Nlog 2 N additions, vuhere N is the number of samples, and no 
multiplication. In this chapter, the mathematical background of FIT and 
hardware aspects are described in detail. 

3.1 Mathematical background of Walsh Functions 

Walsh functions are periodic with period R where N is an integral 
power of two. So the complete orthogonal set will have N distinct functions. 
These functions are designated as Wal (m,n). The complete set is represented 
over the range m = 0,1 ,2 ... H-1 and n = 0,1 ,2 ... N-1 . 

The first two discrete Walsh functions are defined as 
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Wal (0,n) = 1 

for 

Wal (l,n) = 1 

for 

- 1 

for 


n = 0, 1 N-1 

n = 0, 1 , 2 ... N-1 
N N 

H — 2 ..i N— 1 


The remainder of the set can be generated by an iterative eq.uation 


I7al (m,n) = Wal 


(f), 2n\ . Wal ! m-2(|), 


n 


(3.1) 




m 


where (— ) indicates the integer part of 


3 .2 Fast \yalsh Trans fom 

Given an N-point* real arrayf(n), we can define the Walsh Transform aS 

p(in) _ ^ f(n) Wal (m,n) m = 0, 1 , N-1 (3-2) 

n=0 

and inverse transform as 

H N-1 ^ . 

_ 1 F(m) Wal (n,in)... n = 0, 1 , .. H-l (3*3) 

^ m=0 

Since Walsh Transforms can have values of +1 and -1 only, computations 
of Eq.n. (5.2) require no multiplications. Using Egn. (3.1 ), a computational 
algorithm can be easily derived analogous to the Cooley - Tukey, algorithm. 
This algorithm will require NloggN summations to compute a complete Walsh 
Transform rather than as indicated by Eqn. 3.3* The derivation of this 
algorithm parallels the one given by Cooley - Tukey and hence this algorithm 

is called Past Walsh Transform (PIW). 

* In FWT literature the term 'N-length' is generally used 
instead of 'N-point' . However, to have a uniform notation, 
the term ’N-point' is used in this thesis for FFT as well as 


V 
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5*5 Si final Flow Graph 

The signal flow graph for an 8-point IW is given in Fig. 3*1 a. 

The flow graph is similar to the one given in Pig, 2*2 differing only in 
the weights. Prom this signal flow graph, one oan convert some Cooley - 
Tukey transfoim programs to Walsh transform hy setting all the trigonometric 
values to 1 *0 or hy removing the steps vhich multiply the array values hy 
the trigonometric values* Also, the complex part of the Cooley - Tukey 
operation can he removed since the weights arc- all real. The inverse 
Walsh transform is identical to the Walsh Transform except for all values 
are divided by N . 

The final result A, will contain all the coefficients of 8-point 

3 

Walsh Transform. However, just as in Cooley - Tukey algorithm the order 
of values will he in hit reversed form. 

3 *4 Block Diagram for FWT 

The organisation of the Past Walsh Transfoim Generator is essentially 
the same as that of a special purpose computer. It basically consists of 

4 units namely the I/O unit, the memory, the arithmetic unit and the 
control unit. The analog signal to he transformed is digitised using an 
A/D converter at the input and these samples are stored in a Random Access 
memory (RAM) . This RAM is used for storing the data, intemediate results 
and the final results. The task of AU is to carry out the computations, 
and the main control unit controls the processor and the memory. Main 
control unit also generates the addresses for the basic computations. The 
block diagram for PWT is diown in Fig* 3*2* 
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5 *5 The Basic Computation 

The algorithm is similar to the Cooley - Tukey algorithm whore 

for the number of samples N = 2'^, the factoring is continued till the'ie 

I-j 

are ^ groups of 2 points. These two-point transforms can be calculated 

easily and they are combined to produce four-point transforms; four-point 

transforms are then combined to produce eight-point transforms and so on. 

If this process is continued, then after log^N such steps, the complete 

'N’ point transferal is obtained. These steps are called iterations and in 

N 

each iteration theye are — basic computations. 

The ba,sic computation of FIT is now the generation of two-point 
transfnran. It consists of three steps; (i) accessing two numbers from the 
memory locations, (ii) adding the two numbers and storing the sum in the 
first location and (iii) subtracting the second number from the first 
number and storing the difference in the second location. This basic 
computational pattern is shown in Pig. 3 -lb where and ^re the data 

t I 

fetched from the first and second locations respectively, and and 
are the new values generated by the transformation, which are to be 
loaded back in the same locations. 

5 1's Complement Versus 2's Complement 

Basically the F’OT AU consists of an adder and a subtractor with 
proper AU control. These operations can be performed in serial or parallel 
form. Parallel operations are naturally faster than the serial operations. 
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Since the arithmetic involved in PlW is not much, parallel processor has 
been chosen even though it is somewhat costlier than the serial processor. 
Since subtractions are also involved in the system^ either 1's complement 

to 

or ^ 3 complement arithmetic hasj^besi chosen. 1's complement is evidently 
the better choice processor in this case, since the hardware required is 
mucn more for 2's complanent in a parallel processor. Since the processor 
is fast, we need not have separate adders and suhtractors as one of them 
can do the other function also by just changing the sign bit thereby 
minimising the hardware. 

3 *7 FV/T Arithmetic Unit 

The hardware realisation for PWT basically consists of a 1 6 -bit 
parallel 1's complement adder which is shown in Fig. A 2.9. The two banks 
of exclusive - OR (EX-OR) gates at the data inputs of the adder are mainlj'- 
provided for obtaining the 1's complement of the two data which are in 
sign-magnitude fo.im when they are read into data registers from the memory. 
A bonk of BX-OR gates at the output is used to obtain the 1's complement 
of the result and to bring the resultant data back in the sign-magnitude 
form before they are written back in the memory. 4-bit binary full adders 
7483 are used for adders. 

Since 1's - complement operation is used, final carry output of the 
adder is given as the carry input of the full adder corresponding to the 
LSB. This F!i?T AR is very fast as it is asynchronous and the total time- it 
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takes for one particular operation depends upon only the o- ^ -n, 

r t- 0- uu« g3-t;g (Jelays. The 

addition and subtraction are performed in accordance with +1. • . 

"xuq the sign bit of 

PIT which goes to one of the inputs of the SK-OH gate corresponding™ to the 

MSB of the second data. Since the memory access time corresponding to -'ne 

data is longer than the actual AU time for one operation +u 

■ ^ xne memory cycle 

itself can be made to include the ATI cycle also. During this cjrale th^ 
sign bit for the second data is switched in such a way that both addition 
and subtraction are performed for a computation. This memory cyc^e will he 
stepped only when all the iterations are over and so the total memory cycle 
time for all these iterations is the same as the total time that has been 
used to perform the Walsh Transformation. 



CHAPTER ly 


FWT AND FFT MAIN CONTROL UNIT 


The nigin control unit generates the, addresses for the butterfly 
computations involved in FFT and FIT and also it controls the data flow from 
PWT and PPT AU's to the memory. It also keeps track of the iteration number 
and computation number currently in progress. It also gives provisions in the 
FFT mode for scrambling the data to make decimation in time or decimation in 
frequency whichever is required. Modularity is also introduced in the 
control unit so that the same unit woiks for any number of points in FFT 
or FS?T. 

4.1 Iteration Register 

The main function of the Iteration Register (IR) is to keep track 
of the iteration currently in progress. For an N(=2 ) point FWT/PFT, the 
total numher of iterations required is n and so a shift register of n hits 
can he used. Since the iteration number controls the other circuits like 
Address counter and decoders for increme nt/interchange command generation, 
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strr.ight binsry counter with n states is not preferable. The lE is 
designed in such e „ey that k*** bit is '1' in the k*" iteration and the 
rest of the bits are zero. Modularity can he achieved by loading a 
oitifjlc 1 in d particular place of the shift register when the coinputation 
starts. ’.^Jtienever an iteration ends, this single '1' is shifted one place 
to the right. When this 'l' is shifted out of the shift register, it 
signifies all the n iterations to be over and the process stops. 

4 *2 Address Generation 

Both in PPT and F'fT, each computation req.uires a pair of addresses. 
These r-iddresses follow a logical pattern as given by Kanade^. In the k^'‘^ 
iteration, the k bit of the first address is always ’O’, and the 
correaponding bit of the second address is always '1' . If this bit is 
suppressed, then the remaining bits are always in a natural binary sequence. 
So the second address can always be generated from the first address. So 
an Address counter of n bits for a 2'^-point PFT is required to generate 
this first address. 

A simple ripple counter with n flip-flops can be used for this 

purpose. The clear terminals of these flip-flops are connected to the 

th 

inverter of the corresponding IE bit. Thus if the k bit of IE '1' , then 
the k'^'^ flip-flop is cleared for the complete duration of the iteration, 
thus making the k^*^ bit of this address permanently zero throughout that 
iteration. Instead of connecting the output of each flip-flop only the 
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clock input of the next flip-flop as in a binary ripple counter, it is 
also connected to the clock input of the flip-flop after the next one in 
the line through a set of NMD gates as shown in Fig. A2.10. In 
iteration, these gates cause the output of (k-1 ) flip-flop to be 
connected to the clock input of (k+l flip-flop thus skipping the k"^^ 

flip-flop entirely. In other flip-flops, these gates cause output of the 
j)revlous flip-flops connected to the clock input of the next thereby 
inaintaining the natural binary sequence. 

Since only k'*'^ bit of the IR is '1' in k^^ iteration, the individual 
bits of the second address can be obtained simply by means of a set of a 
2-input OR gates, having the output of the address counter and the corres- 
ponding bit of IR as their inputs . 

Since FFT contains both real and imaginary parts, the address can at 
most have 11 bits for 2K-points, while all the 12 bits can be used for the 
address in PH, as there is no imaginary part present in the data. 

4 .5 Address Scrambling 

For a naturally ordered sequence of samples, the Fourier coefficients 
and the ^alsh coefficients do not occur in the natural order but in the bit 
reversed order. To access a particular coefficient, we must find the 
reversed bit integer of the address and access the coefficient from that 
location. For example, in the S-pcint FFT, the coefficient P(5) occurs 
in location 6 because 6(l10) is the reversed bit integer of 5(01'l)* So 
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the :'-e 3 r r::r; 1 .1 n;; operation means the reallocating the data from its present 
address to a location whose address is given by the reverse-bit ordering 

of the present address. 

c: c. 

For FPT, the above process is called 'decimation in freq,uency' since 
the Fourier coefficients occur in the reversed bit order. On the other 
hand, if the Fourier coefficients are to be obtained in the natural sequence 
of frequency, then the data or the record of samples must be scrambled in 
the reverse bit order before the start of the computation. This is called 
'decimation in time' . So in general, if decimation in frequency is 
required then the scrambling operation should be done after all the compu- 
tations are over and if 'decimation in time* is required, then the scrambling 
operation should be done before the actual start of the computations. 

In the case of PWT, however, scrambling does not tend to an ordering 

of the coefficients according to sequency, but leads to what is known as 
7 

'dyadic ordering'. To achieve sequency- ordering for FOT , the addresses have 
first to bo treated as being in the Gray code, and converted therefrom to 
binary before scrambling is done. This not only requires extra computation 
time, but also means that sequency-ordered YWS cannot be obtained using FFT 
algorithm simply by bypassing the miiltiplications. Thus from the view 
point of the fast transform, dyadic ordering is advantageous. 

Since a modular design for 512, 1024, 2048 and 4 O 96 is aimed, AOI 
gates are used in an appropriate manner to do the bit reversal operation 
as shown in Fig. A 2.12. While scrambling, the operation should be continued 
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only upto h'.ll the number of points after which the operation should be 
.;jt(-; ![ '•.,n 03 iiiitinir bcrambling will bring the coefficients once again in 
the bit reversed address as they were before scrambling. 

Since the RAJ.! permits access to only a single word at a time, one - 
of - two selectors are used to select one of the two addresses corresponding 
to s butterily . For the same reason as above, one - of - two selectors 
nre used to select either the normal address or the corresponding scrambled 

.cddresG . 


4 -4 Memory Clock and Gating Signals 

The control, unit has to generate many signals like Read memory, ’.tote 
itianory, load the data in data registers and so on. The pattern in which 
these various signals occur is shown as a flow chart in Pig. 4*1 • 

Since only a single word can be accessed from the memory at a time, 
the real and imaginary parts of the data for FPT have to be fetched one 
after another from the memory. Since two data are involved in each 
computation, a total of 4 accesses to memory are required for reading and 
4 more for writing. Except for the last data in last iteration, TOiting 
the previous data is always followed by reading the next data and hence 
a burst of 8 pulses can be used as the memory' clock with the first 4 pulses 
for writing and the next 4 pulses for reading. 
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ThOTe is n ’scramble* switch n^ich starts the memory clock just 
as in noroQl operation. Since scrambling do^ not involve any AU operation, 
the memory clock is stopped only when the scrambling corresponding to 
half the total ntmber of points is over. 

Wnile data is read into registers a,b,», and d, the data should be 
directed to one of the registers at a time and this gating is done by the 
shift register 7495 with a set of NMD gates. For FWT, the data loaded in 
registers b and d are not iised, and while writing, the resultant data need 
not be brought back to the registers as it can be directly written in the 
memoiy since they are available in parallel. So one - of - two selectors 

A2.15 

as shown in Pig ./are necessary to select either the PPT data or the FWI 
data before wariting in memory. A single control bit is used to determine 
whether FPT or PUT is used, and this bit, is realised by a SFDI switch. 

Buffers &ice provided at various places where the fan-out exceeds the 
limit. In FPT, when the increment ocamnand is given, it takes some time for 
the increment counter and decoder to settle down and so some delay is 
necessary between the occurrence of increment command and the loading of the 
increment in increment registers. To provide this delay and to produce 
some very short pulses, the monostables are used in many places. The LED 
displays are provided to indicate that all the computations are over and 
all the scrambling operations are over... 
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Th-re is ' 'scramble' switch which starts the memory clock just 
rc ;i<' 1 cj ration. Since scrambling does not involve any AU operation, 

■ i.c ry cl'.'ck i.c nt;o|-ped only when the scrambling corresponding to 
h' I !' i}; ';ctcl number of points is over. 

■.7nilc: data is read into registers a,b,®, and d, the data should be 
dircctcci to one of the registers at a time and this gating is done by the 
shift 749*3 vd.th a sot of liAKB gates. For FWT, the data loaded in 

r-’vlcti . fc b and d cro not used, and while writing, the resultant data need 
t'uh } ’ I Ic'cl to the registers as it can be directly written in the 

niH'c; tiny ni’i,' nvailablo in parallel. So one - of - two selectors 
.f-Mh 

•c; chown in Fig. /nr-; nucossary to select either the FFT data or the Fl/T 
(I'lt-. ii'dl rn kviff, ing in inoEiory. A single control bit is used to determine 
whether FITT or P'iTT is used, and this bit is realised by a SPOT switch. 

Buffers are provided at various places where the fan-out exceeds the 
limit. In FPT, when the increment command is given, it takes some time for 
the increment counter and decoder to settle down and so some delay is 
nucuaenry betwetm the occurrence of increment command and the loading of the 
increment in increment registers. To provide this delay and to produce 
some very short pulses, the monostables are used in many places. The LED 
displays ai‘e provided to indicate that all the computations are over and 
ell the scrambling operations are over. 
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In tho pivisent cotvbext, modulsrity means that the basic design 
I'u/.itj. nnch-iHg. --I for any number of points provided it is a power of 2'. 
Since the control unit for FFT and PITT is highly symmetrical and repetitive 
it can bo cade to act for various number of points. It can b^ easily made 
itiodulor by a simple switch which causes the initial M’ to be loaded in 
different bits of the iteration register and also it selects the overflow 
of nn approprifitc bit of the computation counter. The ’Increment Logic' 
su.g.gested cl,50 allows for the modular design as the increments and the 
patturn they follow art? the same and all increments need not be used. 
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Fig. A2.4 WEIGHT GENERATION (IMAGINARY PART ) 
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Fig. A2.7 BUFFER FOR MALTIPLIER 






Fig. A2.8 DATA DISPLAY 
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Fig. A2,14 TIMING DIAGRAM FOR CONTROL UNIT 





